Goto

Collaborating Authors

 pattern discovery



Interactive Multi Interest Process Pattern Discovery

arXiv.org Artificial Intelligence

Existing PPDMs typically are unsupervised and focus on a single dimension of interest, such as discovering frequent patterns. We present an interactive multi-interest-driven framework for process pattern discovery aimed at identifying patterns that are optimal according to a multi-dimensional analysis goal. The proposed approach is iterative and interactive, thus taking experts' knowledge into account during the discovery process. The paper focuses on a concrete analysis goal, i.e., deriving process patterns that affect the process outcome. We evaluate the approach on real-world event logs in both interactive and fully automated settings. The approach extracted meaningful patterns validated by expert knowledge in the interactive setting. Patterns extracted in the automated settings consistently led to prediction performance comparable to or better than patterns derived considering single-interest dimensions without requiring user-defined thresholds.


Real-time Workload Pattern Analysis for Large-scale Cloud Databases

arXiv.org Artificial Intelligence

Hosting database services on cloud systems has become a common practice. This has led to the increasing volume of database workloads, which provides the opportunity for pattern analysis. Discovering workload patterns from a business logic perspective is conducive to better understanding the trends and characteristics of the database system. However, existing workload pattern discovery systems are not suitable for large-scale cloud databases which are commonly employed by the industry. This is because the workload patterns of large-scale cloud databases are generally far more complicated than those of ordinary databases. In this paper, we propose Alibaba Workload Miner (AWM), a real-time system for discovering workload patterns in complicated large-scale workloads. AWM encodes and discovers the SQL query patterns logged from user requests and optimizes the querying processing based on the discovered patterns. First, Data Collection & Preprocessing Module collects streaming query logs and encodes them into high-dimensional feature embeddings with rich semantic contexts and execution features. Next, Online Workload Mining Module separates encoded queries by business groups and discovers the workload patterns for each group. Meanwhile, Offline Training Module collects labels and trains the classification model using the labels. Finally, Pattern-based Optimizing Module optimizes query processing in cloud databases by exploiting discovered patterns. Extensive experimental results on one synthetic dataset and two real-life datasets (extracted from Alibaba Cloud databases) show that AWM enhances the accuracy of pattern discovery by 66% and reduce the latency of online inference by 22%, compared with the state-of-the-arts.


Discovering COVID-19 Coughing and Breathing Patterns from Unlabeled Data Using Contrastive Learning with Varying Pre-Training Domains

arXiv.org Artificial Intelligence

Rapid discovery of new diseases, such as COVID-19 can enable a timely epidemic response, preventing the large-scale spread and protecting public health. However, limited research efforts have been taken on this problem. In this paper, we propose a contrastive learning-based modeling approach for COVID-19 coughing and breathing pattern discovery from non-COVID coughs. To validate our models, extensive experiments have been conducted using four large audio datasets and one image dataset. We further explore the effects of different factors, such as domain relevance and augmentation order on the pre-trained models. Our results show that the proposed model can effectively distinguish COVID-19 coughing and breathing from unlabeled data and labeled non-COVID coughs with an accuracy of up to 0.81 and 0.86, respectively. Findings from this work will guide future research to detect an outbreak of a new disease early.


Beyond Object Identification: A Giant-Leap into Pattern Discovery in Imagery Data

#artificialintelligence

A critical question that arises after identifying the objects (or class labels) in an imagery database is: "How are the various objects discovered in an imagery database correlated with one another?" This article tries to answer this question by providing a generic framework that can facilitate the readers to discover hidden correlations between objects in the imagery database. The portion of this article is drawn from our work published in IEEE BIGDATA 2021 [1].) The framework to discover the correlation between the objects in an imagery database is shown in Figure 1. Demonstration: In this demo, we first pass the image data into a trained model (e.g., resnet50) and extract objects and their scores.


Data Science Workshop 2021: 10 Real Projects From Scratch - CouponED

#artificialintelligence

Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data. But how is this different from what statisticians have been doing for years? The answer lies in the difference between explaining and predicting. A Data Analyst usually explains what is going on by processing history of the data. On the other hand, Data Scientist not only does the exploratory analysis to discover insights from it, but also uses various advanced machine learning algorithms to identify the occurrence of a particular event in the future.


Pattern Discovery and Validation Using Scientific Research Methods

arXiv.org Artificial Intelligence

Pattern discovery, the process of discovering previously unrecognized patterns, is often performed as an ad-hoc process with little resulting certainty in the quality of the proposed patterns. Pattern validation, the process of validating the accuracy of proposed patterns, remains dominated by the simple heuristic of "the rule of three". This article shows how to use established scientific research methods for the purpose of pattern discovery and validation. We present a specific approach, called the handbook method, that uses the qualitative survey, action research, and case study research for pattern discovery and evaluation, and we discuss the underlying principle of using scientific methods in general. We evaluate the handbook method using three exploratory studies and demonstrate its usefulness.


Letters to the editor

#artificialintelligence

Artificial intelligence is an oxymoron (Technology quarterly, June 13th). Intelligence is an attribute of living things, and can best be defined as the use of information to further survival and reproduction. When a computer resists being switched off, or a robot worries about the future for its children, then, and only then, may intelligence flow. I acknowledge Richard Sutton's "bitter lesson", that attempts to build human understanding into computers rarely work, although there is nothing new here. I was aware of the folly of anthropomorphism as an AI researcher in the mid-1980s.


The Limitations of Neural Networks

#artificialintelligence

"Neural networks are faced with three big issues…" Today, neural networks dominate the landscape of AI and AIOps, the question I pose is whether this is justifiable and sustainable, writes Will Cappelli, CTO EMEA and Global VP of Product Strategy at Moogsoft. Let's look at this commercially. Within the context of AIOps, neural networks have peaked in their ability to deliver effective and meaningful results. There are a number of limiting issues that relate directly to neural network algorithms, and it is my belief that these cannot be changed. I would say that neural networks are faced with three big issues, and this ranges from single layer neural networks to multiple layer networks.


Solving the "Data Explosion" Problem with University of Illinois Data Mining Pioneer Jiawei Han Coursera Blog

#artificialintelligence

Jiawei Han, a professor of computer science at the University of Illinois at Urbana-Champaign, was recently named a Michael Aiken Chair, one of the University's highest awards. The endowed chair is the latest honor in Han's distinguished and pioneering career, with notable accomplishments including creating core data mining algorithms and co-authoring the textbook that is considered by many to have defined the field. Professor Han is also a busy and successful teacher with a love for "train[ing] the younger generation, whether at UIUC or all over the world on Coursera." Professor Han had three PhD students graduate in May, with one becoming a professor at Georgia Tech, one joining Google, and one joining Facebook. Students taking his classes as part of the Online Master of Computer Science in Data Science degree have an opportunity to learn from him through videos and can ask him questions directly during live office hours.